2,014 research outputs found
Identifying Keystone Species in the Human Gut Microbiome from Metagenomic Timeseries using Sparse Linear Regression
Human associated microbial communities exert tremendous influence over human
health and disease. With modern metagenomic sequencing methods it is possible
to follow the relative abundance of microbes in a community over time. These
microbial communities exhibit rich ecological dynamics and an important goal of
microbial ecology is to infer the interactions between species from sequence
data. Any algorithm for inferring species interactions must overcome three
obstacles: 1) a correlation between the abundances of two species does not
imply that those species are interacting, 2) the sum constraint on the relative
abundances obtained from metagenomic studies makes it difficult to infer the
parameters in timeseries models, and 3) errors due to experimental uncertainty,
or mis-assignment of sequencing reads into operational taxonomic units, bias
inferences of species interactions. Here we introduce an approach, Learning
Interactions from MIcrobial Time Series (LIMITS), that overcomes these
obstacles. LIMITS uses sparse linear regression with boostrap aggregation to
infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested
LIMITS on synthetic data and showed that it could reliably infer the topology
of the inter-species ecological interactions. We then used LIMITS to
characterize the species interactions in the gut microbiomes of two individuals
and found that the interaction networks varied significantly between
individuals. Furthermore, we found that the interaction networks of the two
individuals are dominated by distinct "keystone species", Bacteroides fragilis
and Bacteroided stercosis, that have a disproportionate influence on the
structure of the gut microbiome even though they are only found in moderate
abundance. Based on our results, we hypothesize that the abundances of certain
keystone species may be responsible for individuality in the human gut
microbiome
A Rule of Thumb for the Power Gain due to Covariate Adjustment in Randomized Controlled Trials with Continuous Outcomes
Randomized Controlled Trials (RCTs) often adjust for baseline covariates in
order to increase power. This technical note provides a short derivation of a
simple rule of thumb for approximating the ratio of the power of an adjusted
analysis to that of an unadjusted analysis. Specifically, if the unadjusted
analysis is powered to approximately 80\%, then the ratio of the power of the
adjusted analysis to the power of the unadjusted analysis is approximately , where is the correlation between the baseline covariate
and the outcome
Can RBMs be trained with zero step contrastive divergence?
Restricted Boltzmann Machines (RBMs) are probabilistic generative models that
can be trained by maximum likelihood in principle, but are usually trained by
an approximate algorithm called Contrastive Divergence (CD) in practice. In
general, a CD-k algorithm estimates an average with respect to the model
distribution using a sample obtained from a k-step Markov Chain Monte Carlo
Algorithm (e.g., block Gibbs sampling) starting from some initial
configuration. Choices of k typically vary from 1 to 100. This technical report
explores if it's possible to leverage a simple approximate sampling algorithm
with a modified version of CD in order to train an RBM with k=0. As usual, the
method is illustrated on MNIST
Constructing ensembles for intrinsically disordered proteins
The relatively flat energy landscapes associated with intrinsically disordered proteins makes modeling these systems especially problematic. A comprehensive model for these proteins requires one to build an ensemble consisting of a finite collection of structures, and their corresponding relative stabilities, which adequately capture the range of accessible states of the protein. In this regard, methods that use computational techniques to interpret experimental data in terms of such ensembles are an essential part of the modeling process. In this review, we critically assess the advantages and limitations of current techniques and discuss new methods for the validation of these ensembles
- …